Parametric Analysis

Authors

Dr. Muhammad Za’im bin Mohd Samsuri

Dr. Tengku Muhammad Hudzaifah bin Tengku Mokhtar

Dr. Muhammad Abdul Hafiz bin Kamarul Zaman

Published

June 15, 2025

knitr::include_graphics("group.gif")

1 Load necessary libraries

library(survival)
library(survminer)
library(dplyr)
library(ggplot2)
library(tidyr)
library(broom)
library(tidyverse)
library(readr)
library(readxl)
library(janitor)
library(gtsummary)
library(labelled)

2 Load the dataset

data1 <- read_excel("survival_data_esrd.xlsx")
colnames(data1) <- tolower(colnames(data1))
view(data1)

2.1 Describe data

In a simulated dataset of 456 patients with End-Stage Renal Disease (ESRD), patient survival time was measured to assess the duration from the onset of ESRD to death or censoring. Survival time was recorded in months and represents the time under renal replacement therapy. The dataset was designed to evaluate factors influencing mortality among ESRD patients. The variables are defined as follows:

ID: Patient ID time_to_event_months: The time (in months) from the onset of ESRD until the patient died or was censored event: Indicates whether the patient died (coded 1) or was censored (coded 0)
age: The patient’s age at baseline (in years)
gender: Indicates the patient’s gender (“Male” or “Female”)
treatment_group: Indicates the dialysis modality received, either Hemodialysis (HD) or Continuous Ambulatory Peritoneal Dialysis (CAPD)
bmi: The patient’s Body Mass Index (BMI), measured in kg/m²
smoker_status: Indicates whether the patient was a current smoker (“Yes”) or not (“No”) comorbidity_score: A numeric score (0–10) reflecting the patient’s burden of comorbid conditions

3 Data preparation

data1<-data1 %>% mutate_if(is.character,~ as_factor(.))
data1$time_to_event_months <- round(data1$time_to_event_months)
glimpse(data1)
Rows: 456
Columns: 8
$ age                  <dbl> 65, 58, 67, 78, 57, 57, 78, 69, 54, 66, 54, 54, 6…
$ gender               <fct> Male, Female, Male, Female, Male, Male, Male, Fem…
$ treatment_group      <fct> HD, HD, HD, HD, HD, HD, CAPD, HD, HD, CAPD, HD, C…
$ bmi                  <dbl> 24.0, 22.3, 21.0, 23.9, 32.2, 27.6, 22.7, 27.3, 3…
$ smoker_status        <fct> No, No, No, No, No, No, Yes, No, No, No, No, No, …
$ comorbidity_score    <dbl> 1, 1, 2, 3, 2, 1, 5, 5, 2, 1, 3, 0, 2, 6, 7, 3, 3…
$ time_to_event_months <dbl> 59, 45, 186, 161, 154, 3, 120, 55, 188, 217, 67, …
$ event                <dbl> 0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1…

3.1 Summary statistics

summary(data1)
      age           gender    treatment_group      bmi        smoker_status
 Min.   :21.00   Male  :235   HD  :269        Min.   :15.00   No :322      
 1st Qu.:51.00   Female:221   CAPD:187        1st Qu.:22.57   Yes:134      
 Median :60.00                                Median :25.30                
 Mean   :59.62                                Mean   :25.35                
 3rd Qu.:67.00                                3rd Qu.:27.90                
 Max.   :90.00                                Max.   :35.30                
 comorbidity_score time_to_event_months     event       
 Min.   :0.000     Min.   :   1.0       Min.   :0.0000  
 1st Qu.:2.000     1st Qu.:  40.0       1st Qu.:0.0000  
 Median :3.000     Median :  97.0       Median :1.0000  
 Mean   :2.952     Mean   : 153.3       Mean   :0.5526  
 3rd Qu.:4.000     3rd Qu.: 192.0       3rd Qu.:1.0000  
 Max.   :9.000     Max.   :1272.0       Max.   :1.0000  
tbl_summary(data1,
            statistic = list(all_continuous() ~ "{median} ({p25}, {p75})",
                             all_categorical() ~ "{n} ({p}%)"),
            digits = all_continuous() ~ 1,
            missing = "no") %>%
  modify_header(label ~ "**Characteristic**") %>%
  modify_caption("**Table: Patient Characteristics (N = 456)**") %>%
  bold_labels()
Table: Patient Characteristics (N = 456)
Characteristic N = 4561
age 60.0 (51.0, 67.0)
gender
    Male 235 (52%)
    Female 221 (48%)
treatment_group
    HD 269 (59%)
    CAPD 187 (41%)
bmi 25.3 (22.6, 27.9)
smoker_status 134 (29%)
comorbidity_score 3.0 (2.0, 4.0)
time_to_event_months 97.0 (40.0, 193.0)
event 252 (55%)
1 Median (Q1, Q3); n (%)

This study included a total of 456 patients. The median age was 60 years, with most patients between 51 and 67 years old. There were slightly more males (52%) than females (48%). In terms of dialysis modality, 59% of the patients were on hemodialysis, while 41% were on continuous ambulatory peritoneal dialysis (CAPD). The median body mass index (BMI) was 25.3 kg/m², indicating that most patients were in the normal to slightly overweight range. About 29% of the patients were current smokers. The median comorbidity score was 3, suggesting that many patients had at least a few other health conditions. The median follow-up period was 97 months. During the study, 252 patients (55%) died. # Estimation of AFT models

3.2 Exponential model

surv.mod <- Surv(time = data1$time_to_event_months, event = data1$event == 1)
exp.mod <- survreg(surv.mod ~ age + gender + treatment_group + bmi + smoker_status + comorbidity_score, data = data1, dist='exponential')
summary(exp.mod)

Call:
survreg(formula = surv.mod ~ age + gender + treatment_group + 
    bmi + smoker_status + comorbidity_score, data = data1, dist = "exponential")
                       Value Std. Error     z       p
(Intercept)          5.92160    0.56593 10.46 < 2e-16
age                 -0.00235    0.00553 -0.42  0.6714
genderFemale         0.15546    0.12700  1.22  0.2209
treatment_groupCAPD  0.34996    0.13009  2.69  0.0071
bmi                  0.00307    0.01638  0.19  0.8514
smoker_statusYes    -1.05318    0.13280 -7.93 2.2e-15
comorbidity_score   -0.06829    0.03500 -1.95  0.0510

Scale fixed at 1 

Exponential distribution
Loglik(model)= -1634.9   Loglik(intercept only)= -1669.7
    Chisq= 69.63 on 6 degrees of freedom, p= 4.9e-13 
Number of Newton-Raphson Iterations: 5 
n= 456 

3.2.1 Interpretation of the exponential model

Patients on CAPD had significantly longer survival than those on HD, with a time ratio (TR) of 1.42 (p = 0.007), indicating a 42% longer survival time. Smokers had much shorter survival, with a TR of 0.35 (p < 0.001), while each unit increase in comorbidity score was associated with a 6.6% reduction in survival time (TR = 0.93, p = 0.051). Age, gender, and BMI showed no significant association with survival (p > 0.05). The corresponding hazard ratios (HR) derived from the model were 0.70 for CAPD, 2.87 for smokers, and 1.07 per unit increase in comorbidity, indicating that CAPD patients had 30% lower risk of death, smokers had nearly 3 times higher risk, and higher comorbidity modestly increased the risk of death.

3.3 Weibull model

wei.mod <- survreg(surv.mod ~ age + gender + treatment_group + bmi + smoker_status + comorbidity_score, data = data1, dist = 'weibull')
summary(wei.mod)

Call:
survreg(formula = surv.mod ~ age + gender + treatment_group + 
    bmi + smoker_status + comorbidity_score, data = data1, dist = "weibull")
                       Value Std. Error     z       p
(Intercept)          5.92451    0.54708 10.83 < 2e-16
age                 -0.00251    0.00536 -0.47  0.6389
genderFemale         0.15307    0.12283  1.25  0.2127
treatment_groupCAPD  0.34557    0.12592  2.74  0.0061
bmi                  0.00304    0.01584  0.19  0.8481
smoker_statusYes    -1.04435    0.12894 -8.10 5.5e-16
comorbidity_score   -0.06745    0.03379 -2.00  0.0459
Log(scale)          -0.03414    0.04830 -0.71  0.4797

Scale= 0.966 

Weibull distribution
Loglik(model)= -1634.6   Loglik(intercept only)= -1668.4
    Chisq= 67.58 on 6 degrees of freedom, p= 1.3e-12 
Number of Newton-Raphson Iterations: 7 
n= 456 

The Weibull scale parameter is estimated at 0.966, giving a shape parameter of 1/0.966=1.0352 . Since the shape parameter is greater than 1, it indicates that the hazard of death increases over time.

3.3.1 Interpretation of the Weibull model

The estimated log time to death in patients receiving CAPD treatment (compared to HD) was 0.34557. The Acceleration Factor (AF) or Time Ratio (TR) is exp ⁡(0.34557)=1.41256 exp(0.34557)=1.41256

This means that patients on CAPD have 1.41 times longer survival time than those on HD. In other words, CAPD is associated with a longer time to death compared to HD.

The estimated log time to death for current smokers (compared to non-smokers) was −1.04435, with an AF = exp(−1.04435) = 0.35179. This indicates that current smokers have about 35% of the survival time of non-smokers — i.e., smoking shortens survival substantially.

For comorbidity score, the estimated log time was −0.06745, giving an AF = exp(−0.06745) = 0.9348. This means that for each additional point in comorbidity score, survival time decreases by about 6.5%, suggesting higher comorbidity leads to shorter time to death.

Other variables such as age, gender, and BMI were not statistically significant (p > 0.05), indicating no strong evidence of their association with survival time in this model.

3.3.2 Converting to TR and HR in Weibull model

library(SurvRegCensCov)
ConvertWeibull(wei.mod,conf.level = 0.95)
$vars
                        Estimate          SE
lambda               0.002176019 0.001395541
gamma                1.034728987 0.049977699
age                  0.002600289 0.005549084
genderFemale        -0.158388473 0.127121806
treatment_groupCAPD -0.357574701 0.130582205
bmi                 -0.003140769 0.016392173
smoker_statusYes     1.080617872 0.138576818
comorbidity_score    0.069790002 0.035007647

$HR
                           HR        LB        UB
age                 1.0026037 0.9917584 1.0135675
genderFemale        0.8535181 0.6652831 1.0950124
treatment_groupCAPD 0.6993705 0.5414464 0.9033563
bmi                 0.9968642 0.9653459 1.0294115
smoker_statusYes    2.9464995 2.2456888 3.8660118
comorbidity_score   1.0722830 1.0011770 1.1484391

$ETR
                          ETR        LB        UB
age                 0.9974901 0.9870734 1.0080169
genderFemale        1.1654094 0.9160596 1.4826317
treatment_groupCAPD 1.4127996 1.1038254 1.8082595
bmi                 1.0030400 0.9723747 1.0346723
smoker_statusYes    0.3519210 0.2733324 0.4531053
comorbidity_score   0.9347767 0.8748808 0.9987731

A parametric survival model was used to assess factors associated with time to death. Patients on CAPD had significantly longer survival compared to those on HD, with an expected time ratio (ETR) of 1.41 (95% CI: 1.10 to 1.81), indicating that CAPD patients lived about 41% longer on average. Smokers had significantly shorter survival times, with an ETR of 0.35 (95% CI: 0.27 to 0.45), suggesting their survival time was only 35% of that in non-smokers. Each additional comorbidity reduced survival by about 6.5% (ETR = 0.93, 95% CI: 0.87 to 1.00). Other variables including age, gender, and BMI showed no significant time effects. The corresponding hazard ratios indicated that CAPD patients had a 30% lower risk of death (HR = 0.70, 95% CI: 0.54 to 0.90), smokers had 2.95 times higher risk of death (HR = 2.95, 95% CI: 2.25 to 3.87), and each additional comorbidity increased the risk of death by 7% (HR = 1.07, 95% CI: 1.00 to 1.15). ####other model

library(flexsurv)
wei.mod.aft <- flexsurvreg(Surv(time_to_event_months, event) ~ age + gender + treatment_group + bmi + smoker_status + comorbidity_score, 
                           data = data1, dist = 'weibull')
wei.mod.aft
Call:
flexsurvreg(formula = Surv(time_to_event_months, event) ~ age + 
    gender + treatment_group + bmi + smoker_status + comorbidity_score, 
    data = data1, dist = "weibull")

Estimates: 
                     data mean  est        L95%       U95%       se       
shape                       NA   1.03e+00   9.41e-01   1.14e+00   5.00e-02
scale                       NA   3.74e+02   1.28e+02   1.09e+03   2.05e+02
age                   5.96e+01  -2.51e-03  -1.30e-02   7.98e-03   5.36e-03
genderFemale          4.85e-01   1.53e-01  -8.77e-02   3.94e-01   1.23e-01
treatment_groupCAPD   4.10e-01   3.46e-01   9.88e-02   5.92e-01   1.26e-01
bmi                   2.53e+01   3.04e-03  -2.80e-02   3.41e-02   1.58e-02
smoker_statusYes      2.94e-01  -1.04e+00  -1.30e+00  -7.92e-01   1.29e-01
comorbidity_score     2.95e+00  -6.74e-02  -1.34e-01  -1.23e-03   3.38e-02
                     exp(est)   L95%       U95%     
shape                       NA         NA         NA
scale                       NA         NA         NA
age                   9.97e-01   9.87e-01   1.01e+00
genderFemale          1.17e+00   9.16e-01   1.48e+00
treatment_groupCAPD   1.41e+00   1.10e+00   1.81e+00
bmi                   1.00e+00   9.72e-01   1.03e+00
smoker_statusYes      3.52e-01   2.73e-01   4.53e-01
comorbidity_score     9.35e-01   8.75e-01   9.99e-01

N = 456,  Events: 252,  Censored: 204
Total time at risk: 69922
Log-likelihood = -1634.618, df = 8
AIC = 3285.236

3.4 PH Model

wei.mod.ph <- flexsurvreg(Surv(time_to_event_months, event) ~age + gender + treatment_group + bmi + smoker_status + comorbidity_score, data = data1,dist = 'weibullPH')
wei.mod.ph
Call:
flexsurvreg(formula = Surv(time_to_event_months, event) ~ age + 
    gender + treatment_group + bmi + smoker_status + comorbidity_score, 
    data = data1, dist = "weibullPH")

Estimates: 
                     data mean  est        L95%       U95%       se       
shape                       NA   1.034729   0.941268   1.137470   0.049978
scale                       NA   0.002176   0.000619   0.007648   0.001396
age                  59.625000   0.002600  -0.008276   0.013476   0.005549
genderFemale          0.484649  -0.158388  -0.407543   0.090766   0.127122
treatment_groupCAPD   0.410088  -0.357575  -0.613511  -0.101638   0.130582
bmi                  25.347149  -0.003141  -0.035269   0.028987   0.016392
smoker_statusYes      0.293860   1.080618   0.809012   1.352223   0.138577
comorbidity_score     2.951754   0.069790   0.001176   0.138404   0.035008
                     exp(est)   L95%       U95%     
shape                       NA         NA         NA
scale                       NA         NA         NA
age                   1.002604   0.991758   1.013568
genderFemale          0.853518   0.665283   1.095012
treatment_groupCAPD   0.699370   0.541446   0.903356
bmi                   0.996864   0.965346   1.029412
smoker_statusYes      2.946500   2.245689   3.866012
comorbidity_score     1.072283   1.001177   1.148439

N = 456,  Events: 252,  Censored: 204
Total time at risk: 69922
Log-likelihood = -1634.618, df = 8
AIC = 3285.236

A Weibull proportional hazards model was used to evaluate the risk of death in relation to demographic and clinical variables. Patients on CAPD had a significantly lower risk of death compared to those on HD, with a hazard ratio (HR) of 0.70 (95% CI: 0.54 to 0.90), indicating a 30% risk reduction. Smokers had nearly 3 times higher risk of death than non-smokers (HR = 2.95, 95% CI: 2.25 to 3.87), while each unit increase in comorbidity score was associated with a 7% increase in mortality risk (HR = 1.07, 95% CI: 1.00 to 1.15). Age, gender, and BMI were not significantly associated with the hazard of death (p > 0.05). The shape parameter was approximately 1.03, suggesting the hazard function was roughly constant over time, consistent with the exponential distribution. ## Model adequacy for Weibull distribution

WeibullDiag(Surv(time = data1$time_to_event_months, event = data1$event == 1) ~ treatment_group, 
            data = data1)

WeibullDiag(Surv(time = data1$time_to_event_months, event = data1$event == 1) ~ smoker_status, 
            data = data1)

4 Prediction

4.1 Predict the survival time using weibull model

For example, predicted survival times were estimated at the 25th, 50th (median), and 75th percentiles for a hypothetical patient with the following characteristics:

Age = 50 years Gender = Female BMI = 25 kg/m² Treatment Group = Hemodialysis (HD) Comorbidity Score = 2.5 Smoker Status = Yes

new.data <- data.frame(age = 50, gender = "Female", bmi = 25 ,treatment_group = "HD", comorbidity_score = 2.5,smoker_status = "Yes")
quant.p <-c(0.25, 0.50, 0.75)
months.p <- predict(wei.mod, newdata = new.data, type = 'quantile', p=quant.p)
months.p
[1]  36.99420  86.54223 169.10423

The predicted survival times (in months) under the Weibull model are:

25th percentile: 37.0 months 50th percentile (Median survival time): 86.5 months 75th percentile: 169.1 months

This means that 25% of patients with this profile are expected to die before 37 months,50% before 86.5 months, and 75% before 169.1 months.

cbind(quant.p, months.p)
     quant.p  months.p
[1,]    0.25  36.99420
[2,]    0.50  86.54223
[3,]    0.75 169.10423

4.1.1 Plot the predicted survival time

plot(x = predict(wei.mod, newdata = new.data, type = "quantile", 
                 p = (1:98)/100), y = (1:98)/100 , type = "l")

4.2 Log-logistic model

logl.mod <- survreg(surv.mod ~ age + gender + treatment_group + bmi + smoker_status + comorbidity_score, data = data1, dist = 'loglogistic')
summary(logl.mod)

Call:
survreg(formula = surv.mod ~ age + gender + treatment_group + 
    bmi + smoker_status + comorbidity_score, data = data1, dist = "loglogistic")
                       Value Std. Error     z       p
(Intercept)          4.92612    0.60267  8.17 3.0e-16
age                  0.00194    0.00586  0.33  0.7404
genderFemale         0.22456    0.13578  1.65  0.0982
treatment_groupCAPD  0.39594    0.13837  2.86  0.0042
bmi                  0.01197    0.01711  0.70  0.4844
smoker_statusYes    -1.00505    0.14448 -6.96 3.5e-12
comorbidity_score   -0.06360    0.03972 -1.60  0.1093
Log(scale)          -0.30209    0.05185 -5.83 5.7e-09

Scale= 0.739 

Log logistic distribution
Loglik(model)= -1638.6   Loglik(intercept only)= -1667.2
    Chisq= 57.13 on 6 degrees of freedom, p= 1.7e-10 
Number of Newton-Raphson Iterations: 4 
n= 456 

The Acceleration Factor is exp(0.39594) = 1.49, which means that patients on CAPD had 1.49 times longer survival time compared to those on HD.

For smokers, the estimated log time ratio was –1.00505. The Acceleration Factor is exp(–1.00505) = 0.366, indicating that smokers had a shorter survival time by a factor of 0.366 compared to non-smokers — meaning they died more quickly.

Other variables such as age, gender, BMI, and comorbidity score did not show significant effects on survival time (p > 0.05). The scale parameter was 0.739, indicating moderate variation in survival distribution.

5 Checking the PH assumption

kmfit <- survfit(surv.mod ~ data1$treatment_group)
kmfit
Call: survfit(formula = surv.mod ~ data1$treatment_group)

                             n events median 0.95LCL 0.95UCL
data1$treatment_group=HD   269    152    154     127     186
data1$treatment_group=CAPD 187    100    212     167     335
plot(log(kmfit$time), log(-log(kmfit$surv)), col = c("red", "blue"), xlab = "log(time)", ylab = "log(-log(survival))")

The plot do not look like a straight lines. It is also not parallel. So it does not support PH assumption. Our PH asssumption is violated. Our log-logistic model (we did earlier) for example might not be appropriate.

https://github.com/thuzaifah/Advance-statisitical-assignment.git